Search CORE

14 research outputs found

Performance Analysis of the SHA-3 Candidates on Exotic Multi-core Architectures

Author: D. Patterson
D.A. Osvik
H.P. Hofstee
J. Daemen
J.W. Bos
M. Bellare
M. Stevens
O. Takahashi
R. Benadjila
R. Szerwinski
S. Marechal
S.A. Manavski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The NIST hash function competition to design a new cryptographic hash standard 'SHA-3' is currently one of the hot topics in cryptologic research, its outcome heavily depends on the public evaluation of the remaining 14 candidates. There have been several cryptanalytic efforts to evaluate the security of these hash functions. Concurrently, invaluable benchmarking efforts have been made to measure the performance of the candidates on multiple architectures. In this paper we contribute to the latter; we evaluate the performance of all second-round SHA-3 candidates on two exotic platforms: the Cell Broadband Engine (Cell) and the NVIDIA Graphics Processing Units (GPUs). Firstly, we give performance estimates for each candidate based on the number of arithmetic instructions, which can be used as a starting point for evaluating the performance of the SHA-3 candidates on various platforms. Secondly, we use these generic estimates and Cell-/GPU-specific optimization techniques to give more precise figures for our target platforms, and finally, we present implementation results of all 10 non-AES based SHA-3 candidates

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Cofactorization on Graphics Processing Units

Author: A. Moss
A.K. Lenstra
C. Pomerance
D. Loebenberger
D.A. Osvik
D.J. Bernstein
D.J. Bernstein
D.J. Bernstein
D.J. Bernstein
H. Hisil
H.M. Edwards
H.W. Lenstra Jr.
J. Gilger
J. Pelzl
J. Yang
J.M. Pollard
J.M. Pollard
J.W. Bos
J.W. Bos
J.W. Bos
K. Gaj
M.O. Rabin
O. Harrison
O. Harrison
P. Zimmermann
P.L. Montgomery
P.L. Montgomery
P.L. Montgomery
R. Szerwinski
R.P. Brent
S. Collange
T. Güneysu
T. Jebelean
T. Kleinjung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We show how the cofactorization step, a compute-intensive part of the relation collection phase of the number field sieve (NFS), can be farmed out to a graphics processing unit. Our implementation on a GTX 580 GPU, which is integrated with a state-of-the-art NFS implementation, can serve as a cryptanalytic co-processor for several Intel i7-3770K quad-core CPUs simultaneously. This allows those processors to focus on the memory-intensive sieving and results in more useful NFS-relations found in less time

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Cryptology ePrint Archive

GPU Accelerated Cryptography as an OS Service

Author: A. Menezes
D. Blythe
J. Yang
J.-J. Quisquater
M.C. Riffa
O. Harrison
O. Harrison
R. Szerwinski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

A High-Performance Implementation of Differential Power Analysis on Graphics Cards

Author: A. Moradi
D.J. Bernstein
E. Brier
J. Sanders
O. Harrison
P.C. Kocher
R. Szerwinski
S. Mangard
S. Mangard
S.J. Lee
Publication venue
Publication date: 01/01/2011
Field of study

Part 6: Non-invasive AttacksInternational audienceWe present an implementation for Differential Power Analysis (DPA) that is entirely based on Graphics Processing Units (GPUs). In this paper we make use of advanced techniques offered by the CUDA Framework in order to minimize the runtime. In security testing DPA still plays a major role for the smart card industry and these evaluations require, apart from educationally prepared measurement setups, the analysis of measurements with large amounts of traces and samples, and here time does matter. Most often DPA implementations are tailor-made and adapted to fit certain platforms and hence efficient reference implementations are sparsely seeded. In this work we show that the powerful architecture of graphics cards is well suited to facilitate a DPA implementation, based on the Pearson correlation coefficient, that could serve as a high performant reference, e.g., by analyzing one million traces of 20k samples in less than two minutes

CiteSeerX

Crossref

pub H-BRS - Publikationsserver der Hochschule Bonn-Rhein-Sieg

Acceleration of composite order bilinear pairing on graphics hardware

Author: A. Lewko
D. Boneh
D. Freeman
J. Katz
O. Harrison
P.S.L.M. Barreto
R. Szerwinski
S. Kawamura
S. Meiklejohn
Publication venue: Springer
Publication date: 25/04/2011
Field of study

Abstract. Recently, composite-order bilinear pairing has been shown to be useful in many cryptographic constructions. However, it is time-costly to evaluate. This is because the composite order should be at least 1024bit and, hence, the elliptic curve group order n and base field become too large, rendering the bilinear pairing algorithm itself too slow to be practical (e.g., the Miller loop is Ω(n)). Thus, composite-order computation easily becomes the bottleneck of a cryptographic construction, especially, in the case where many pairings need to be evaluated at the same time. The existing solution to this problem that converts composite-order pairings to prime-order ones is only valid for certain constructions. In this paper, we leverage the huge number of threads available on Graphics Processing Units (GPUs) to speed up composite-order pairing computation. We investigate suitable SIMD algorithms for base field, extension field, elliptic curve and bilinear pairing computation as well as mapping these algorithms into GPUs with careful considerations. Experimental results show that our method achieves a record of 8.7ms per pairing on a 1024bit security level, which is a 20-fold speedup compared to state-of-the-art CPU implementation. This result also opens the road to adopting higher security levels and using rich-resource parallel platforms, which for example are available in cloud computing. In fact, we can achieve more than 24 times speedup on a 2048bit security level and a record of 7 × 10 −6 USD per pairing on the Amazon cloud computing environment.

CiteSeerX

Crossref

Cryptology ePrint Archive

HKU Scholars Hub

High-Speed Elliptic Curve Cryptography on the NVIDIA GT200 Graphics Processing Unit

Author: D.J. Bernstein
D.J. Bernstein
E. Lindholm
H. Hisil
J.W. Bos
P.L. Montgomery
P.L. Montgomery
R. Szerwinski
S. Antão
Publication venue
Publication date: 01/01/2014
Field of study

Crossref

Open Repository and Bibliography - Luxembourg

Exploiting the Floating-Point Computing Power of GPUs for RSA

Author: A. Moss
D.E. Knuth
D.J. Bernstein
J.J. Quisquater
J.W. Bos
N. Koblitz
O. Harrison
P.L. Montgomery
R. Szerwinski
R.L. Rivest
S. Antão
S. Pu
Ç.K. Koç
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware

Author: A. Menezes
D. Cook
D.E. Knuth
J. Yang
J.-J. Quisquater
K.C. Posch
K.C. Posch
N.S. Szabo
O. Harrison
P.L. Montgomery
R. Szerwinski
S. Fleissner
S. Kawamura
Publication venue
Publication date: 01/01/2009
Field of study

Graphics processing units (GPU) are increasingly being used for general purpose computing. We present implementations of large integer modular exponentiation, the core of public-key cryptosystems such as RSA, on a DirectX 10 compliant GPU. DirectX 10 compliant graphics processors are the latest generation of GPU architecture, which provide increased programming flexibility and support for integer operations. We present high performance modular exponentiation implementations based on integers represented in both standard radix form and residue number system form. We show how a GPU implementation of a 1024-bit RSA decrypt primitive can outperform a comparable CPU implementation by up to 4 times and also improve the performance of previous GPU implementations by decreasing latency by up to 7 times and doubling throughput. We present how an adaptive approach to modular exponentiation involving implementations based on both a radix and a residue number system gives the best all-around performance on the GPU both in terms of latency and throughput. We also highlight the usage criteria necessary to allow the GPU to reach peak performance on public key cryptographic operations

CiteSeerX

Crossref

OSPREY 3.0: Open‐source protein redesign for you, with powerful new features

Author: Abraham M. J.
Donald B. R.
Frey K. M.
Georgiev I.
Globerson A.
Hallen M. A.
He K.
Nvidia C.
Ojewole A. A.
Szerwinski R.
Wang X.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Modular Resultant Algorithm for Graphics Processors

Author: E. Frantzeskakis
E. Savas
G.E. Collins
H. Hong
J. Llovet
K. Geddes
M. Monagan
O. Harrison
P. Emeliyanenko
R. Szerwinski
S. Chandrasekaran
T. Bubeck
T. Kailath
W.D. Hillis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

MPG.PuRe